-
Payment verified
-
$30K+ spent
-
Location United Kingdom
- Fixed price
- Expert
- Est. budget: $1,000.00
**Title:** Python Engineer — Extract information from SEC 10-K filings # **Description** We’re hiring a Python engineer to build an extractor that converts **SEC 10-K HTML filings** into structured **JSON**. The focus is on: 1. The **full Table of Contents** (TOC) 2. The **Notes to the Consolidated Financial Statements** This is a paid pilot (10 filings). If the approach is robust, we’ll scale to other filings. ## What you’ll do - Parse **static HTML filings** (direct EDGAR URLs). - Output JSON conforming to our schema (to be provided). - Capture **verbatim text** only — **no paraphrasing**. - Build reliable **deep links** (`href`) to headings. - Add a **QA block** with coverage metrics (e.g., toc_found, notes_found, counts, anchor accuracy). **Notes on approach:** - [**doc2dict**](https://github.com/john-friedman/doc2dict) is a useful starting library that generates a hierarchical block tree from filings. You may build on top of it, or propose an alternative if stronger. - **LLM parsing is permitted but must be minimal** — it can assist in heuristics or edge cases, but **the core extraction must rely on exact text matching** to ensure fidelity. ## Must-have skills - Strong Python (BeautifulSoup, lxml, regex). - Experience parsing and normalizing messy HTML. - Comfort with deterministic heuristics for TOC/Notes detection and anchor generation. **Nice to have:** SEC/EDGAR familiarity, XBRL experience. ## Engagement - **Type:** Fixed price for pilot (10 filings). - **Busget:** $1,000 - **Scope:** TOC + Notes extraction; more work if quality is high. - **Timeline:** Please propose. - **Detailed spec:** Shared with shortlisted candidates in Notion. --- # For the rest of UpWork post requirements ### **Skills** - Python - BeautifulSoup / lxml / HTML parsing - Regular Expressions - JSON data structuring - Experience with SEC/EDGAR filings (nice to have) - XBRL familiarity (nice to have) ### **Screening Questions** 1. Briefly describe a similar parsing project you’ve done. 2. How would you approach TOC + Notes extraction while ensuring verbatim text (no paraphrasing)? 3. What experience do you have working with **SEC filings or financial regulatory data**?
- Proposals: Less than 5
-
Payment unverified
-
$0 spent
-
Location Jordan
- Hourly: $5.00 - $8.00
- Intermediate
- Est. time: 1 to 3 months, Less than 30 hrs/week
We are seeking a skilled web scraping specialist who can efficiently extract data from various online sources and integrate it simultaneously into ChatGPT. The ideal candidate should have a strong understanding of web scraping techniques and experience with ChatGPT API integration. You will be responsible for ensuring that the data flows seamlessly into the ChatGPT framework, enabling it to process and utilize the information effectively. If you have a passion for data manipulation and AI, we would love to hear from you!
- Proposals: Less than 5
-
Payment verified
-
$4K+ spent
-
Location DEU
- Fixed price
- Expert
- Est. budget: $1,500.00
We are looking for an experienced team of developers who can build a social media enterprise crawler system that automatically discovers new accounts of influencers with at least 10,000 followers and brand accounts identified even with a few 100 subscribers and all public accounts including posts, followers and metadata etc reliably crawls exact points we need that need to be crawled are in the document what I have attached The system must be able to process several million accounts per month, to track several 10,000 accounts per month permanently, for example in 10-minute intervals for new comments, new likes, new stories, etc., be scalable, stable, self-healing. The system combines reverse engineering of app and web APIs, emulation of mobile apps, web scraping and uses legitimate APIs, where available. The architecture must be fully containerized, contain monitoring and self-healing, and store the data according to the rules. Core tasks of the developer Crawler development: Automatic capture of millions of public accounts using Python and the Scrapy Framework. Ensure that each account is crawled again only in case of relevant changes. Robust editing of changes to user interfaces or APIs, if necessary through dynamic JavaScript execution with Playwright. Implementation of a rule-based error handling that reports failed jobs to Apache Kafka for reprocessing. Discovery & Classification: Automatic discovery of new accounts by Python-based bots that analyze follower lists, hashtags and interactions, and publish new crawl tasks in Kafka. The classification into influencers (≥10k followers), brands and general accounts is done in the downstream Java-based processing service. Prioritize relevant accounts and reconcile historical data to avoid double crawls. Scalable infrastructure: Building a fully containerized worker architecture on Kubernetes that enables massively parallel crawling. Implementation of geo-redundancy through intelligent proxy management. Configure auto-scaling and ensure zero downtime deployments through professional CI/CD pipelines. The entire infrastructure is mapped via Terraform as code (Infrastructure-as-Code). Monitoring and self-healing: Implementation of the industry standard stack from Prometheus, Grafana and Loki to track KPIs such as crawling success, errors and API rate limits. Leverage Kubernetes's self-healing mechanisms for automatic error detection and service recovery. Creation of a central dashboard in Grafana for manual control and adjustment of priorities. Data management: Building a high-performance, column-oriented database with ClickHouse for lightning-fast analysis. A Java/Spring boot service processes the raw data from Kafka, validates, structures and stores it. Implementation of versioning and de-duplication of profile data. Provision of a secure and documented Java-based API for data access and further processing. Team structure and employees The project requires a team of three developers. The roles are: Lead Backend Engineer / System Architect for the overall architecture and Java backend logic, DevOps engineer for Kubernetes, Terraform, Kafka and CI/CD and a backend engineer specializing in the crawling and discovery modules in Python/Scrapy. All team members must have extensive experience with distributed systems, the mentioned technologies and large-volume crawling pipelines. Timetable and schedule The development is divided into two phases (total duration: 6 weeks). Phase 1 (Week 1-3): Minimum Viable Product (MVP). These include the core functions for discovery, crawling (scrapy), the basic data pipeline with Kafka and ClickHouse, and basic monitoring. Phase 2 (Week 4-6): V1.0 - Finalization & Stabilization. Here, self-healing mechanisms, advanced stability, automatic scaling, geo-redundance, the detailed Grafana dashboard, the final data API (Java) as well as versioning and de-duplication are implemented. Technical requirements Expertise in reverse engineering of app and web interfaces, API-based data extraction and mobile app emulation. Languages & Frameworks: A polyglottic architecture of Python (with Scrapy & Playwright) for data collection and Java (with Spring Boot) for data processing and API services. Infrastructure: Experience with containerization (Docker), orchestration (Kubernetes) and Infrastructure-as-Code (Terraform). Data streaming: In-depth experience with Apache Kafka as a central data backbone. Database: Demonstrable ability to build and operate a powerful, replicated, and hardened database like ClickHouse. Observability: Practical experience with Prometheus, Grafana and Loki. Scaling: Creating a system that continuously monitors 30,000 profiles and processes millions of accounts every month, without manual lists. Remuneration Fixed salary: $1,500. Performance bonus: up to $750 if the system is stable, documented, and fully functional. Expected results A fully functional crawling system with detection, classification, crawler engine, robust selector handling, failure and integrity layers, monitoring, self-healing, containerization, automatic scaling, geo-redundance, high-performance database and API. Fully documented code, deployment guides (runbooks), test plans for load, regression and stability. The system must continuously monitor new content that supports versioning, provides deduplicated data, and allows access through the API. We assume that the project will require about 400 hours of work for the first implementation. We prefer employees from countries with cheaper rates and have set an hourly wage of about $7 for the calculation, resulting in a fixed price for the project. This is an opportunity that we offer the team, as we have planned further work in the long term and can also pay a higher remuneration in the future. The team should see this as an opportunity to design and build a large, scalable system from scratch. The ongoing operating costs for the crawling system cannot exceed $1,000 per month. These costs relate to infrastructure, proxy and network providers, storage, APIs, and other services required for data collection and operation of the system. They do not include developers' salaries. To reach this limit, all design decisions, crawling strategies, and infrastructure components must be optimized for maximum cost efficiency.
- Proposals: Less than 5
- Number of freelancers needed: 3
-
Payment verified
-
$0 spent
-
Location Germany
- Hourly: $18.00 - $42.00
- Expert
- Est. time: Less than 1 month, Less than 30 hrs/week
"I need to scrape^H a website. I will send the login after hiring. In the website, different fields and drop downs have to be filled. Once this is done, it is possible to download excel files. The script should download all the excel files and store them locally."
- Proposals: Less than 5
-
Payment verified
-
$10K+ spent
-
Location Germany
- Fixed price
- Intermediate
- Est. budget: $30.00
We are seeking a skilled freelancer to scrape data from an ecommerce page. The ideal candidate will have experience with web scraping tools and techniques to efficiently gather and organize data. It's a huge ecommerce page. I will need product data from category pages. So product name, price, reviews, image, link.
- Proposals: 20 to 50
-
Payment verified
-
$3K+ spent
-
Location United Kingdom
- Fixed price
- Intermediate
- Est. budget: $50.00
I need www.hiflo.com/catalogue to be scraped. The “Search for models” form should be used to record all available model options for bikes available. There are almost 10k oil filter entries and around 5k air filters. End result should be two csv tables, one for air filters, one for oil filters. The columns should be CC, Make, Model, Year, SKU and ExtraInfo. Some models will have more than one filter option, then a separate row should be created. Some filters will have additional info like “1st filter”, “2nd filter” - these should go into the ExtraInfo column.
- Proposals: 20 to 50
-
Payment verified
-
$10K+ spent
-
Location Canada
- Hourly: $15.00 - $45.00
- Intermediate
- Est. time: 1 to 3 months, Not sure
Project Overview: I own FullSizeChevy.com, a long-running Chevy/GMC truck community that we are currently relaunching. Much of the original forum content is still accessible via the Internet Archive (Wayback Machine), but a lot was lost when Photobucket and other external image hosts went down. What I’m Looking For: - Scrape and extract archived forum threads (text, formatting, and images where available) from Wayback Machine snapshots. - Organize and deliver the extracted content in a structured, importable format (MySQL, CSV, or JSON). - Prioritize high-value threads (popular build threads, FAQs, tech writeups, how-tos). - Handle broken/missing images gracefully (note them, attempt recovery, or preserve context). - Deliver content ready for integration into a modern forum platform (we are using Discourse). Requirements: - Experience scraping websites and working with the Wayback Machine/Internet Archive. - Familiarity with forum platforms (vBulletin, phpBB, Discourse, etc.). - Ability to map old forum structures (thread → posts → users) into structured export/import. - Strong communication and documentation skills. Deliverables: - Export of 50–100 “best-of” forum threads (initial milestone). - Clear documentation of the scraping process. - Optional: recommendations for full forum scrape vs. selective content recovery. Budget: - Open to hourly or fixed-price. Please provide an estimate based on a milestone of recovering the first 50 threads. How to Apply: - Share examples of past scraping/archival recovery projects. - Describe your experience with forums or legacy database migration. - Provide an estimated timeline and cost for recovering an initial batch of 50 threads.
- Proposals: 15 to 20
-
Payment verified
-
$20K+ spent
-
Location United States
- Fixed price
- Intermediate
- Est. budget: $3,000.00
We need an experienced web crawler developer to build and operate a system that continuously discovers and analyzes PDFs from commercial websites. This is for ongoing compliance monitoring of software usage patterns in publicly available documents. Core Requirements • Build and operate a production-grade web crawler • Target commercial/business websites to find their public PDFs • Extract PDF metadata (specifically /Producer and /Creator fields) • Maintain accurate source URLs and discovery timestamps for each PDF • Focus on PDFs created within the last year • Operate all infrastructure independently (all proxy, server, storage costs must be included in your bid) Technical Questions (Must answer in proposal) First line of proposal: “I understand this requires real-time web crawling, not dataset analysis” 1. How would you architect a crawler to handle 500+ different company domains? 2. Describe your approach to proxy rotation and rate limiting. 3. How do you handle dynamic JavaScript sites that generate PDF links? 4. What’s your method for discovering PDFs without relying on sitemaps? To Apply, Include • Examples of production crawlers you’ve built (with scale metrics) • Your tech stack for web crawling (framework, cloud, database, proxies) • How many PDFs/day you can sustainably crawl • Brief code showing PDF metadata extraction capability (return JSON with /Producer + /Creator) • Confirmation you can run this independently without our infrastructure NOT Looking For • Common Crawl or dataset analysis approaches • One-time scraping projects • Academic research methodologies • Anyone who mentions querying S3 buckets or existing archives Deliverables • Week 1: Demo crawler with 500 PDFs from 50+ unique domains • Week 2–3: Scale to 5,000 PDFs/week • Week 4+: Steady state of 10,000+ PDFs/week • Each PDF record must include: source URL, discovery timestamp, /Producer and /Creator fields Required Experience • Building and operating web crawlers at scale • Managing distributed crawling infrastructure • Anti-detection techniques (proxy rotation, user agents, delays) • Basic PDF structure knowledge • Database design for millions of records Budget: $3,000 fixed price • Milestone 1 (Week 1): $500 – Proof of concept • Milestone 2 (Week 4): $2,500 – Full operational crawler Optional Ongoing Maintenance After Week 4, we may extend this into a monthly retainer for maintenance, updates, and scaling. Please indicate if you’re open to ongoing support. Immediate Rejection • Generic “I can do web scraping” proposals • Mentions of analyzing existing datasets • No specific crawler architecture details • Copy-pasted portfolios without addressing our requirements
- Proposals: 5 to 10
-
Payment verified
-
$6K+ spent
-
Location AUS
- Hourly: $5.00 - $10.00
- Entry Level
- Est. time: Less than 1 month, Less than 30 hrs/week
I need a list of tradespeople and their mobile phone numbers from Hi Pages dot com dot au We need the following information: Business Name Mobile Number Suburb/Post Code/State Website Email We ONLY WANT BUSINESSES WITH A MOBILE NUMBER so a number that starts in 04 You just need to follow the video https://www.loom.com/share/382dba34df1c417390dfd2c3a8c204a2?sid=aa27c3fc-bbeb-4114-aa55-41c1a84dcdfa I need 1000 of them
- Proposals: 50+
-
Payment verified
-
$900+ spent
-
Location USA
- Hourly: $10.00 - $45.00
- Intermediate
- Est. time: 1 to 3 months, 30+ hrs/week
We are looking for a skilled engineer to create a headless web crawler using Puppeteer or Selenium on Browserbase. The primary task will involve scraping listings and detail pages from specified websites and outputting the collected data in JSON or CSV format. The ideal candidate should have experience in web scraping, be proficient in JavaScript, and have a solid understanding of headless browser technologies. Attention to detail and the ability to work independently are essential.
- Proposals: 20 to 50
-
Payment verified
-
$0 spent
-
Location Belgium
- Fixed price
- Expert
- Est. budget: $350.00
Hello, I would like to collect data from the French website pagesjaunes.fr. I want to gather business listings for 75 different professions across all of France, covering every department (100 departments). The idea is to scrape pages of this type: https://www.pagesjaunes.fr/annuaire/departement/nord-59/plombiers (including all pagination) for the 100 departments. So, for example, also the page https://www.pagesjaunes.fr/annuaire/departement/pas-de-calais-62/plombiers … and 98 other similar pages, each time including pagination. … Once this is done, the same process needs to be repeated for the 74 other professions in the same way. Each time, the following information must be collected from the company pages: Url of the page scraped Company name Profession (taken from the profession being scraped) Website Facebook Address Phone number 1 Phone number 2 More info about Company name Number of reviews Rating out of 5 Service area (list activities separated by “;”) Activities (list activities separated by “;”) Services (list activities separated by “;”) Products (list activities separated by “;”) Brands (list activities separated by “;”) SIRET NAF code Staff size Legal form Company creation date Examples of pages: https://www.pagesjaunes.fr/pros/detail?code_etablissement=56626960&code_localite=L06212600&code_rubrique=629620 https://www.pagesjaunes.fr/pros/detail?code_etablissement=09286233&code_localite=D062&code_rubrique=629620
- Proposals: 15 to 20
-
Payment verified
-
$2K+ spent
-
Location South Korea
- Hourly: $8.00 - $25.00
- Intermediate
- Est. time: Less than 1 month, Less than 30 hrs/week
I want to scrape data from a site every week Top 10 lists And put them in a Google Sheet More info in messages
- Proposals: 10 to 15
-
Payment verified
-
$0 spent
-
Location CAN
- Fixed price
- Intermediate
- Est. budget: $100.00
Seeking a skilled data scraping specialist to extract names and email addresses from a public directory. Data needs to be organized and compiled into an Excel CSV sheet for easy access and analysis. The public site to be scrapped presents each web page 10 names at a time in card format. Click next to go the the next 10 names.The total number of members listed is approximately 3000. Currently I visit the site, print one page at a time to Onenote then use Chatgpt to convert the printouts into Excel csv while checking the recently scrapped master list for duplicates. This method is too time consuming for me. I have about 400 contacts in my master list so far. seeking someone to continue the process of provide a new method or get the info with their preferred method. The source website is public, advertises people with a uniform skill set. Site presents each contact in uniform format.
- Proposals: 20 to 50
-
Payment verified
-
$0 spent
-
Location United States
- Hourly
- Intermediate
- Est. time: Less than 1 month, Less than 30 hrs/week
Scrape campaign data from kickstarter.com for academic research project. Project document attached outlining in detail the deliverables, process, and milestones.
- Proposals: 5 to 10
-
Payment verified
-
$200+ spent
-
Location India
- Hourly: $5.00 - $25.00
- Intermediate
- Est. time: 1 to 3 months, Less than 30 hrs/week
We are seeking a skilled data scraper to collect mobile numbers of business owners and decision makers across various industries. Your task will involve utilizing advanced scraping techniques and tools to extract accurate and relevant data from web sources. Familiarity with data privacy regulations and ethical scraping practices is essential. The collected data will be used for targeted marketing efforts. If you have a proven track record in data scraping and can deliver high-quality leads, we want to hear from you.
- Proposals: 10 to 15
-
Payment verified
-
$92 spent
-
Location ESP
- Hourly
- Intermediate
- Est. time: Less than 1 month, Less than 30 hrs/week
Description: I’m building a service for Airbnb hosts where they can input their listing ID and email, pay, and automatically receive improved versions of their property photos. I will handle the AI image enhancement part (API model + prompt). What I need from you is to build the rest of the automation flow. Scope of Work: Scraping Airbnb Photos: Extract all images from a listing using the Airbnb ID. Handle variable numbers and sizes of images (some listings may have 6, others 11+). Ensure highest resolution possible. Automation Flow in n8n: Trigger starts after payment confirmation on the landing page. Steps: scrape Airbnb images → send them to my AI API (I will provide endpoint & prompt) → collect enhanced images → deliver to host by email. Error handling for failed scrapes or API calls. Integration: Landing page collects Airbnb ID + email (via webhook/endpoint). Workflow must tie these inputs into the scraping and delivery process. Deliverables: Fully functional n8n workflow that: Accepts Airbnb ID + email input. Scrapes all listing images. Calls my provided API for enhancement. Emails enhanced images back to the host. Setup documentation and instructions for running/updating the workflow. Skills Needed: Experience with n8n or similar automation tools Web scraping expertise (dynamic pages, variable image counts, anti-bot measures) API integrations (HTTP requests, handling responses, email delivery) Bonus: experience with Airbnb scraping
- Proposals: 10 to 15
-
Payment unverified
-
$200+ spent
-
Location Belgium
- Hourly: $6.00 - $12.00
- Intermediate
- Est. time: Less than 1 month, Not sure
Job Description: We have an Excel file with around 4000 organizations listed. The file currently contains organization names, but is missing two important fields: Number of employees (in categories) Industry sector (from a fixed list) We need a freelancer to research and complete these fields accurately for all organizations. Requirements: Strong skills in web research, data scraping, or LinkedIn/Sales Navigator/Crunchbase research Attention to detail (accuracy is very important) Ability to handle large datasets (4000+ records) efficiently Deliver results in Excel (filling in the missing fields) Deliverables: Updated Excel sheet with: Employee count category Industry sector Data should be reliable and sourced from official websites, LinkedIn, or trusted databases Data Formatting Guidelines: For Employee Count, please use these categories only: 0–50 50–100 100–200 200–500 500+ For Industry Sector, please select from this list only: Business & Financial Services Industrial Production & Logistics Media & Communication Government & Public Sector Health & Research Hospitality, Technology & Innovation Chemicals & Pharmaceuticals Healthcare Sector Other Additional Info: Please specify the tools you plan to use (manual research, automation, LinkedIn, Crunchbase, etc.) Mention how long you expect this to take Budget is open to discussion based on experience and approach
- Proposals: 50+
-
Payment verified
-
$700+ spent
-
Location United Arab Emirates
- Hourly
- Intermediate
- Est. time: 1 to 3 months, 30+ hrs/week
Project type: One-time build with optional ongoing maintenance Overview Build a small web app that scrapes a target site, cleans/transforms the data, stores it, and exposes it via a minimal, user-friendly UI. Must include robust error handling, respectful scraping, and a simple deployable stack. Deliverables 1. Scraper • Handle login/cookies if needed, pagination, and dynamic pages (headless browser fallback) • Respect robots.txt, rate-limit & backoff, rotate UA; proxy support • Structured logging + retry queue; graceful failure modes 2. Parsing & ETL • Normalize data into a well-designed schema • Deduplicate, validate, and transform fields; add basic quality checks 3. Storage & API • DB: Postgres or Mongo (your recommendation with rationale) • REST (or GraphQL) endpoints with auth; pagination/filtering 4. Frontend (minimal) • Simple React/Next.js UI: list + filters + details; responsive 5. Scheduling & Deploy • Job scheduler for refresh/update (cron/worker) • Dockerized; deploy to AWS/GCP/Render/Heroku (your call) • README + infra notes; environment templating 6. Tests & Handover • Unit/integration tests for scraper + parser + API • Short Loom walkthrough of codebase Preferred stack (flexible): • Python (Scrapy/BeautifulSoup + Playwright) or Node.js (Playwright/Puppeteer) • API: FastAPI / Express / Nest • Queue: Celery/RQ (Py) or BullMQ (Node) • Cache: Redis Compliance (must-have): • Follow robots.txt & site ToS; apply respectful scraping (delays, bounded request volume) • Data-protection best practices What to include in your proposal: 1. Links to 2–3 similar scraping projects (screenshots or repos) 2. Short technical approach (1 page): anti-bot strategy, rate-limiting, parsing, data model, proposed stack 3. Milestone timeline with estimates 4. Fixed price for build + optional monthly maintenance rate 5. A small code sample that shows retry/backoff + robots.txt check Suggested milestones: • M1 Architecture & PoC (scrape one entity end-to-end) • M2 Scraper + Parser complete (target coverage + tests) • M3 API + DB finalized • M4 Minimal React UI + auth • M5 Scheduling, deploy, docs & handover Screening questions (answer briefly): 1. When a site uses Cloudflare or similar, what’s your escalation path (requests → Playwright → residential proxies, etc.)? 2. Show pseudocode for per-domain token bucket rate limiting. 3. How do you keep a scraper idempotent and dedupe records? 4. If the HTML structure shifts subtly, how do you detect and recover? 5. Postgres vs Mongo for this dataset — which and why? NDA: Target site shared after NDA.
- Proposals: 10 to 15
-
Payment verified
-
$0 spent
-
Location Thailand
- Hourly
- Intermediate
- Est. time: 1 to 3 months, Less than 30 hrs/week
We are seeking a skilled freelancer to compile a database of 50,000 email addresses. The ideal candidate will have experience in data collection and management, with a keen eye for detail to ensure accuracy and relevance. This project requires sourcing emails from reputable sources while adhering to privacy regulations. If you have a proven track record in compiling large datasets efficiently, we want to hear from you!
- Proposals: 15 to 20
-
Payment verified
-
$700+ spent
-
Location United States
- Hourly: $4.00 - $10.00
- Intermediate
- Est. time: 1 to 3 months, Less than 30 hrs/week
I am seeking a skilled freelancer to scrape and compile a list of leads for local gyms, chiropractors, med spas, and HVAC companies in specified target areas. The ideal candidate will have experience in lead generation and web scraping techniques. You will be responsible for gathering accurate contact information, including names, phone numbers, and email addresses. Attention to detail and ability to work independently are crucial. If you have a proven track record in similar projects, I would love to hear from you!
- Proposals: 20 to 50
-
Payment verified
-
$63 spent
-
Location South Africa
- Fixed price
- Expert
- Est. budget: $90.00
I’ve already laid the groundwork for a monitoring tool that collects the press, but the entire workflow still needs to be completed and stabilized. The final goal is simple: each day, retrieve the local, regional, national, and international editions, analyze their content (PDF, HTML, or any other sustainable format), and then deliver a report that only includes the articles containing the keywords I define myself, along with a clear summary. What is still missing today: • a robust connector (Python, Scrapy, BeautifulSoup, or other) that automatically downloads each newspaper as soon as it is published, regardless of its format; • an OCR and cleaning pipeline for scanned PDFs (Google or other); • homogeneous text extraction between PDFs and HTML pages; • keyword search that is dynamic and modifiable from a small interface (already existing); • a summarization module (for example spaCy, gensim, transformers, or simply via AI) generating a digest per article (related to the keyword) and a global summary; • scheduling (cron, Airflow, or equivalent) so that everything runs without manual intervention; • display of the article summary, its link, and other relevant information on the mini-interface. I provide: the existing project’s folder structure, some partially working scripts, and the list of newspapers to monitor. Your role is to review the architecture, optimize the code, document the installation, and validate the result over several consecutive days of publication. If you’ve already implemented similar solutions or are well-versed in NLP and scraping libraries, I’d be glad to collaborate with you. Do not take price into account Flexible range No more than $100 It’s simply about connecting different systems together. Thanks.
- Proposals: Less than 5
-
Payment verified
-
$4K+ spent
-
Location Netherlands
- Hourly: $5.00 - $10.00
- Intermediate
- Est. time: 1 to 3 months, 30+ hrs/week
**Job Description: Web Researcher for Online Publishers** We are seeking a dedicated and detail-oriented web researcher to join our dynamic team in identifying online publishers globally for our innovative WordPress plugin. As a key player in our project, you will be tasked with the critical role of compiling a comprehensive list of relevant websites that align with our target audience and objectives. This position is perfect for someone who enjoys conducting thorough research and is skilled in utilizing various techniques to uncover valuable publishing opportunities. The ideal candidate will employ a combination of scraping techniques and manual searching to gather data on potential online publishers. You will navigate through diverse online platforms, including blogs, news sites, and other digital media outlets, to identify those that could benefit from our WordPress plugin. Your responsibilities will include analyzing the content and audience of these websites to ensure they are a good fit for our product. Attention to detail is paramount in this role, as the accuracy of the compiled list will significantly impact our outreach efforts. You should be capable of efficiently managing your time and workflow to meet project deadlines while maintaining the highest standards of quality in your research. Key Responsibilities: - Conduct thorough web research to identify potential online publishers across various niches and industries. - Utilize scraping tools and manual search techniques to compile an extensive list of relevant websites. - Analyze the content and audience demographics of identified publishers to evaluate their alignment with our WordPress plugin. If you are enthusiastic about web research and possess a knack for discovering hidden gems in the online publishing world, we want to hear from you! Join us in our mission to expand our reach and enhance our WordPress plugin’s visibility through strategic partnerships with online publishers. Apply today to become an integral part of our team!
- Proposals: 5 to 10
-
Payment verified
-
$1K+ spent
-
Location USA
- Hourly: $8.00 - $20.00
- Intermediate
- Est. time: Less than 1 month, Less than 30 hrs/week
need help scraping data of property owners who recently went into foreclosure. have the list of data fed to a google sheet but it only has property address. we also need name and phone number of owner. property data needs to scraped, ownership data gets identified. owners contact info needs to be skip traced. looking for a clean light weight way to do this automated. google sheets. there are other simil;ar court lists we want to assemble too but lets start here.
- Proposals: 20 to 50
-
Payment verified
-
$0 spent
-
Location United States
- Hourly
- Expert
- Est. time: 1 to 3 months, Less than 30 hrs/week
Description: I need an automated system that pulls restaurant-for-sale listings (BizBuySell, BizQuest, LoopNet, etc.), stores them in Airtable, and uses AI to summarize and score deals (asking price, cash flow, rent %, location). The system should: • Pull new listings automatically (via API, scraping, or email alerts). • Send me notifications (Slack/email) when deals meet my criteria. • Store all leads in Airtable with sortable fields. • Use GPT to summarize and score deals (“Strong / Weak Candidate”). • Optional: automate outreach to brokers via email/SMS. Skills Needed: Zapier/Make, Airtable, Web Scraping, OpenAI API, Twilio/Lob (optional). Deliverables: • Functional Airtable base with automation
- Proposals: 20 to 50
-
Payment verified
-
$100+ spent
-
Location United Kingdom
- Fixed price
- Intermediate
- Est. budget: $15.00
Need Octoparse Expert – Automate Email Submission to Landing Page + IP Rotation Description: I’m looking for an experienced Octoparse expert to help me set up automation for submitting emails into a landing page form. What I need: Build a workflow in Octoparse that: Submits a list of emails (entered manually or imported). Fills out the form on my landing page and clicks “Submit.” Waits for the page reload/thank you confirmation before looping to the next email. Set up IP rotation so that the tool changes IP automatically every few seconds (to avoid detection/blocks). Requirements: Strong hands-on experience with Octoparse automation. Ability to explain and show me how the workflow is set up (short screen share or recording). Experience with proxies/IP rotation within Octoparse. Nice to have: Suggestions for making the process more efficient. Ability to troubleshoot if the landing page has protections (tokens, hidden fields, etc.). Deliverables: A working Octoparse workflow that can: Loop through emails and submit them reliably. Rotate/change IPs dynamically. Clear documentation or short video showing me how to use and modify it (so I can test and adjust).
- Proposals: Less than 5
-
Payment verified
-
$100K+ spent
-
Location United States
- Hourly: $10.00 - $35.00
- Expert
- Est. time: More than 6 months, 30+ hrs/week
Looking for a detail-oriented data analyst to help gather and organize state-level real estate and economic data across all 52 U.S. states. You should be comfortable pulling data via APIs, cleaning and structuring it in Python, and delivering organized outputs (CSV/Excel/JSON). The role requires: Experience with Python (Pandas, API calls, data cleaning). Ability to research reliable data sources when APIs aren’t available. Clear, well-structured documentation of process and code. We need this completed within 3 days. So, only apply if you are ready to start now. The processes are mostly in place, just need to pull the rest of the states and clean it up.
- Proposals: 20 to 50
-
Payment unverified
-
$0 spent
-
Location United States
- Hourly: $5.00 - $50.00
- Expert
- Est. time: Less than 1 month, Less than 30 hrs/week
I’m currently digging into the US Census API to pull the latest stats on stuff like industry, population, and more. I’m trying to find the fastest and most reliable developer who really knows this API inside out. This project’s all about setting up the right pipeline for data enrichment and provisioning — gotta get it done right! If we nail this task, there’s definitely long-term collab in the cards. We’ve got other frontend and backend projects lined up too, so it’s a great opportunity to work together more down the line!
- Proposals: 10 to 15
-
Payment verified
-
$0 spent
-
Location Cambodia
- Hourly: $20.00 - $40.00
- Intermediate
- Est. time: 1 to 3 months, Less than 30 hrs/week
We are seeking a skilled developer to create an automation tool that collects news from specified websites and automatically posts them to Facebook, Twitter, YouTube, and Telegram using N8N. The ideal candidate should be familiar with web scraping techniques and the N8N automation platform. Your expertise will help streamline our news distribution process and ensure timely updates across various social media platforms. If you have experience with API integrations and workflows, we want to hear from you!
- Proposals: 5 to 10
-
Payment verified
-
$4K+ spent
-
Location United States
- Hourly: $5.00 - $8.00
- Expert
- Est. time: More than 6 months, 30+ hrs/week
We need a highly skilled data researcher to build a premium, verified database of homeowners. This is a quality-over-quantity project. We are not looking for scraped, unverified junk data. We are looking for accurate, actionable leads for our sales team. Project Scope & Deliverables: • Target: Homeowners in the United States. (Specific cities/states will be provided). • Data Points Required for Each Lead (Fully Verified): o Property Address (Full mailing address) o Homeowner's Full Name o Verified Phone Number (Direct line or mobile, must be reachable) o Email Address (If possible to find and verify) • Crucial Requirement: The data must be meticulously verified for a high contact rate. We expect the sample to be of such quality that a high percentage of phone numbers are correct and active. • Final Format: Clean, organized Microsoft Excel (.xlsx) or Google Sheets file. Requirements for Freelancers: • Proven Experience: You MUST have prior experience in building verified homeowner, B2C, or real estate lead lists. Please share examples or case studies in your proposal. • Quality Focus: You understand the difference between raw scraped data and researched, verified information. • Own Tools: You will use your own tools, databases, and methods to source and verify this data. We will not provide access to any software. • Attention to Detail: The proposal must be well-written and specific. Generic copy-pasted proposals will be rejected immediately. ⚠️ IMPORTANT: PLEASE READ BEFORE APPLYING ⚠️ We are actively filtering for quality and will automatically reject spam. • DO NOT APPLY if you use automated bots that scrape outdated, junk data. • DO NOT APPLY with a generic, copy-pasted proposal. We will reject templates and AI-generated responses that do not answer our specific questions. • Your proposal must demonstrate your expertise and process. Low-effort bids and spam will be reported. We will select a candidate based on their proposal quality and experience.
- Proposals: 20 to 50
-
Payment unverified
-
$0 spent
-
Location India
- Fixed price
- Intermediate
- Est. budget: $56.74
I am looking for a skilled developer to build a job aggregation tool that can fetch job data from multiple platforms, including LinkedIn, Indeed, Glassdoor, Y Combinator Jobs, and JobsRights.ai. The tool should collect only jobs posted within the last 24 hours and automatically export all details to a Google Sheet, including job title, company name, location, short job description, job link, and date posted. Ideally, the solution should run daily to keep the sheet updated with fresh postings. It would be a plus if the tool supports filters for keywords (like “Software Engineer” or “Analyst”) and locations (like “USA” or “Remote”). Python or Node.js are preferred, but other efficient solutions are welcome. If you have experience with similar job aggregation or automation projects, please share examples of your past work when applying.
- Proposals: Less than 5
-
Payment verified
-
$0 spent
-
Location South Africa
- Fixed price
- Intermediate
- Est. budget: $150.00
Playwright Automation Developer (Browser Automation + Proxy Integration) Job Description: We are looking for an experienced automation developer to build a LinkedIn automation solution using Playwright. The tool will simulate real-user interactions on LinkedIn and must be reliable, scalable, and capable of handling multiple user sessions with unique configurations. Key Responsibilities: Develop automation scripts in Playwright (Node.js or Python). Implement session handling for multiple users. Integrate dedicated proxy support per user to simulate distributed usage. Add robust error handling, logging, and monitoring. Design automation flows that simulate realistic LinkedIn browsing behavior (clicks, delays, scrolling, etc.). Implement solutions to handle CAPTCHA challenges (third-party solver integration or manual fallback). Ensure the solution is modular, scalable, and maintainable. Requirements: Strong experience with Playwright (or Puppeteer/Selenium). Knowledge of browser fingerprinting and methods to reduce detection. Experience with proxy rotation and per-user proxy assignment. Familiarity with headless browser automation and stealth techniques. Solid understanding of web security & anti-bot measures. Strong problem-solving and debugging skills. Deliverables: Playwright-based LinkedIn automation framework. Documentation on setup, usage, and maintenance. Configurable system for user-based sessions + proxy assignment. Captcha-handling integration Please send similar jobs done with replies Thank you
- Proposals: 5 to 10
-
Payment verified
-
$10K+ spent
-
Location United States
- Hourly: $15.00 - $50.00
- Expert
- Est. time: 1 to 3 months, Less than 30 hrs/week
Looking for an experienced freelancer to help us scrape and compile detailed practice information for active physicians in the state of Texas. We already have a list of physician addresses, but we need the name of the practice, relevant contact details, and additional data points for our prospecting efforts. The output must also be filtered to identify and categorize practices located in zip codes above a specified median household income. Responsibilities: Automated data scraping/research solution to: Match addresses with practice names. Identify/contact information (phone number, email address, website, etc.). Validate the accuracy of scraped data. Cross-reference the data against publicly available or private databases to fill in missing fields (practice name, contact details, etc.). Categorize the final data based on zip codes with a median household income above a defined threshold. Deliver a clean, organized dataset (e.g., CSV, Excel) that is ready for use. Requirements: Experience in web scraping, data extraction, and data processing at scale. Strong knowledge of tools and programming languages (e.g., Python, R, or similar) for data scraping and automation. Familiarity with data validation techniques to ensure a high degree of accuracy. Ability to handle large datasets efficiently and work with external APIs or public data sources. Strong attention to detail and organizational skills, ensuring data is consistently formatted and cleaned. Understanding of the US healthcare landscape or experience working with medical data (preferred but not mandatory). Deliverables: A fully compiled spreadsheet or database containing: Physician practice name Corresponding address Contact information (phone number, email, website if available) Additional relevant notes or data points that add value A subset or flagged category of practices whose zip codes exceed the specified median household income threshold. A brief summary or dashboard that highlights how many unique practices and physicians were identified, how many fall within the targeted income thresholds, and any data gaps or limitations.
- Proposals: 20 to 50
-
Payment verified
-
$5K+ spent
-
Location United States
- Hourly: $10.00 - $25.00
- Intermediate
- Est. time: Less than 1 month, Less than 30 hrs/week
This project requires an experienced web scraping developer to create an automated solution for gathering event data from approximately 90-100 venue websites. The data needs to be extracted, consolidated into a standardized format, and delivered in a clean easy-to-use Excel or CSV file. The key challenge is that each venue's website has a different layout. Additionally, some websites require clicking on the individual event to retrieve all of the necessary information (e.g. events price)
- Proposals: 20 to 50
-
Enterprise
-
Payment verified
-
$100K+ spent
-
Location United States
- Fixed price
- Est. budget: $1,500.00
I’m looking to complete Data Extraction work. The work will require 1 freelancer. I anticipate the project will last a week period of time.
- Proposals: 20 to 50
-
Payment verified
-
$0 spent
-
Location United States
- Fixed price
- Entry Level
- Est. budget: $400.00
Looking to scrape social media details for a large number of users. Beginners welcome. Group effort. Training will be provided for the exact tasks needed.
- Proposals: 20 to 50
-
Payment verified
-
$20K+ spent
-
Location United States
- Hourly
- Expert
- Est. time: 3 to 6 months, Less than 30 hrs/week
I'm looking to scrape contact information from various classified pages and have scraping run on a regular, automated basis. Ideally, I'm looking for name, phone, email and to somehow scrape the email domain for business name and address as well. We're using Zoho One so this data will be later used for outreach
- Proposals: 20 to 50
-
Payment verified
-
$30K+ spent
-
Location Belgium
- Hourly: $15.00 - $35.00
- Expert
- Est. time: 1 to 3 months, Less than 30 hrs/week
We are looking for an experienced web scraping specialist who can help us build an automated solution to extract company data from Trendstop. Scope of Work: Scrape a full list of companies from Trendstop (between 2,000 – 5,000 companies). Extract and deliver executive-level contact details (emails of directors/decision makers). Ensure the process is automated so we can reuse it when needed. Focus on companies located in Belgium and the Netherlands. Requirements: Proven experience with web scraping (Python, Scrapy, Selenium, BeautifulSoup, or similar). Knowledge of techniques to avoid blocks (rotating proxies, user agents, etc.). Experience with data cleaning and structuring (CSV/Excel/Database). Understanding of GDPR/compliance best practices when handling contact data. Deliverables: Scraped dataset (including company details and executive emails). Documentation or script that allows us to rerun the scraping process. What we expect from you: Share examples of similar scraping projects you’ve successfully completed. Explain why you are the right fit for this project. After reviewing your application, we’ll arrange a short call where we’ll walk you through the platform and requirements in more detail.
- Proposals: 20 to 50
-
Payment verified
-
$0 spent
-
Location United States
- Hourly: $6.00 - $10.00
- Intermediate
- Est. time: More than 6 months, Less than 30 hrs/week
I am an investor pursuing acquisitions of Micro-SaaS companies. I need a live system (not a 1-time spreadsheet) that continuously scrapes, enriches, and scores SaaS products across multiple ecosystems so I can identify acquisition targets (ARR $100k–$3M). Scope of Work 1. Data collection (scraping + APIs): Scrape multiple app stores: QuickBooks, Xero, Shopify, Stripe, Zapier, Google Workspace Marketplace, Microsoft AppSource, Zoho, Sage, FreshBooks, Wave. Scrape review platforms: G2, Capterra, GetApp. Scrape/monitor founder communities: IndieHackers, Product Hunt, Hacker News (Show HN), Makerlog. Each source must capture: product name, URL, vendor, reviews count, average rating, pricing info, integrations, description, last update, etc. Handle pagination, CAPTCHAs, and anti-scraping protections (Playwright/Selenium/Rotating proxies). 2. Enrichment: Extract pricing data from product websites. Enrich with Similarweb (traffic, keyword data) using my credentials. Pull founder/team size from LinkedIn, funding stage from Crunchbase. (using my credentials) 3. Scoring engine (automated): Every product scored 0–100 using weighted criteria (formula will be provided per below): Channel leverage (15%) Under-marketing (15%) Product simplicity (15%) Unit economics proxy (25%) Negotiability (10%) High-willingness-to-pay niche (20%) 4. Output & workflow: Data must flow into Google Sheets (or Airtable). quarterly cron job/refresh automatically updates all sources. Tabs required: Raw Listings, Raw Details, Derived (scores), Digest (weekly changes), A-List (≥80 score). Auto-generate a one-pager for each A-List vendor, including reviews, ratings, ARR estimate, deal hypothesis, risks, and uplift opportunities. 5. Alerts: quarterly email digest: new A-List vendors, rank/rating changes, vendors showing fatigue (no updates, stalled reviews, discounts, founder exits). Tech Requirements: Strong experience in web scraping with anti-bot measures (Playwright/Selenium, rotating proxies, CAPTCHA solving). Knowledge of APIs & enrichment (Similarweb, Crunchbase, LinkedIn, Google Sheets API). Ability to build reliable cron jobs or cloud functions (AWS Lambda, GCP Cloud Functions, or VPS). Clean, maintainable Python code (modular, config-driven). Deliverables: GitHub repo with documented Python scripts (scraping, enrichment, scoring, sheet writer). Config files (so categories/URLs can be added easily). Google Sheet with all required tabs + scoring running automatically. Deployment instructions (how to run locally + deploy to cloud). Success Criteria: System identifies at least 100 SaaS vendors on first run across 2–3 sources (e.g., G2 + QuickBooks App Store). quarterly refresh runs automatically, with deltas clearly tracked. At least 5–10 A-List one-pagers generated in first run. Screening Questions (please answer in your proposal): What scraping frameworks & anti-bot tools do you normally use (Playwright, Puppeteer, Selenium, proxy providers)? Show me an example of a scraper/dashboard you’ve built that outputs structured data into Google Sheets or Airtable. How would you design the system so new categories or sources can be added without rewriting code? How do you typically deal with CAPTCHAs and review sites with strong anti-scraping defenses? What’s your availability for ongoing maintenance after initial build? Budget: Setup (one-time, fixed): $1,500–3,000 depending on robustness. Ongoing maintenance (optional): $200–500/month for scraper updates and monitoring. ⚠️ Note: This is not manual research. If you plan to Google and paste into a spreadsheet, please do not apply.
- Proposals: 5 to 10
-
Payment verified
-
$400K+ spent
-
Location United States
- Hourly
- Expert
- Est. time: Less than 1 month, Less than 30 hrs/week
We want to scrape 35,661 records from this link : https://apps.coachingfederation.org/eweb/CCFDynamicPage.aspx?webcode=ccfsearch ( see attachment) The data points needed are: first name last name email phone website link ( if available) location ( country) Please confirm the time, and cost. We need the final data in excel sheet ordered by country ( location)
- Proposals: 50+
-
Payment verified
-
$500+ spent
-
Location Tunisia
- Fixed price
- Entry Level
- Est. budget: $5.00
Job: Webscraping data from two websites and delivering them as CSVs.
- Proposals: 20 to 50
-
Payment verified
-
$100+ spent
-
Location United States
- Fixed price
- Expert
- Est. budget: $100.00
Need someone who can individually save and download all 485 pages from my website as HTML files - e.g. "Webpage, Complete (*.htm, *.html) Will need to refer to these later if I need to copy/paste text, download an image or video, etc.
- Proposals: 20 to 50
-
Payment verified
-
$0 spent
-
Location USA
- Hourly: $15.00 - $35.00
- Expert
- Est. time: 1 to 3 months, Less than 30 hrs/week
We are building a directory-style website and need help collecting structured data from public sources. The role involves using WebHarvy or similar scraping tools to extract information (names, addresses, phone numbers, amenities, pricing, etc.) from publicly available websites. Responsibilities: Use WebHarvy (or other tools) to extract structured data. Organize results into CSV/Excel format for bulk import into WordPress. Verify accuracy and remove duplicates. Follow provided guidelines for which websites and fields to capture. Work efficiently while respecting site terms and legal guidelines. Requirements: Experience with WebHarvy, Octoparse, ParseHub, or similar web scraping software. Strong skills in data cleaning and formatting. Familiarity with WordPress/CSV imports is a plus. Attention to detail and accuracy. Deliverables: Clean, ready-to-import CSV file(s) with all requested data fields. Documentation of scraping process (which sites, which fields, any errors). Budget: Fixed-price or hourly depending on experience. Please include examples of past projects.
- Proposals: 15 to 20
-
Payment verified
-
$20K+ spent
-
Location Germany
- Hourly
- Intermediate
- Est. time: Less than 1 month, Less than 30 hrs/week
I need several Scrapy-based crawlers built on top of my existing codebase. I will provide the target marketplaces one by one, along with access to the sample project in our GitLab repository. Full documentation will also be provided. Your task is to develop the crawlers strictly according to the documentation and use the provided codebase, since the final solution must be integrated into our infrastructure later.
- Proposals: 20 to 50
-
Payment verified
-
$40K+ spent
-
Location United States
- Fixed price
- Intermediate
- Est. budget: $50.00
We are seeking a skilled freelancer to help create a web scraper tailored to our specific needs. The ideal candidate will have experience in building efficient and reliable scraping tools that can extract data from various websites. You'll be responsible for analyzing target websites, designing the scraper architecture, and ensuring that data extraction is accurate and follows ethical guidelines. If you are knowledgeable about web scraping libraries and have a strong attention to detail, we want to hear from you!
- Proposals: 10 to 15
-
Payment verified
-
$5K+ spent
-
Location Australia
- Fixed price
- Intermediate
- Est. budget: $100.00
I need a scraper built to extract restaurant data from Uber Eats in Australia. I already have approval to use Oxylabs for proxies. Scope of work: 1. Scrape restaurant list within a given city or radius. • Required fields: • Restaurant name • Location (address / suburb) • Rating (stars) • Phone number (if available) 2. Scrape menu data: • Menu items • Prices • Modifier options Requirements: • Must integrate Oxylabs proxies (residential or scraper API). • Deliver results in CSV or Excel format. • Script should be reusable (so I can run it again for new areas). • Bonus: ability to update / refresh the dataset monthly and show new restaurant on boarded (paid extra monthly retainer). Ideal freelancer: • Experience with Python or Node.js web scraping. • Familiar with Oxylabs (or similar proxy providers). • Ability to bypass anti-bot protections and handle large-scale scraping (12,000 - 30,000 restaurants). Deliverables: • Working scraper script (Python/Node.js). • Instructions on how to run it (step-by-step guide). • Initial dataset (CSV/Excel) for 1–2 test cities.
- Proposals: 5 to 10
-
Payment verified
-
$0 spent
-
Location United Kingdom
- Fixed price
- Intermediate
- Est. budget: $200.00
We’re looking for a developer to build a tool that can scan Amazon UK product pages and extract active promotions. The output should be a CSV with: ASIN Product Name Promotions (e.g., “Apply £1.24 voucher”, “Get 3 for the price of 2”) URL Scope We’ll provide sample category/search URLs and example ASINs. The scraper should detect both flat money-off coupons and multibuy/quantity promotions from product detail pages. It can be run on our Windows VPS with a rotating residential proxy (UK). Efficiency is important — we’ll be running this often — so we’d like something that doesn’t waste bandwidth or trigger Amazon’s anti-bot systems. How you do it is up to you: Python, Node, or another language you’re comfortable with. Requests/BeautifulSoup, Playwright, Puppeteer, or another method. Light, text-only scraping or a headless browser if that’s what you prefer. What matters most is results: It reliably finds promotions, even when hidden inside Amazon’s coupon/promo widgets. It outputs clean, structured data to CSV. It can be re-run regularly without falling over. Deliverables Working script/tool we can run on our VPS CSV output in the format described above Simple instructions to run it Nice to have (but not essential) Ability to skip unchanged products between runs Error handling & retries Lightweight / low-bandwidth design To apply, please include: A short outline of how you’d approach this (don’t need deep detail). Examples of similar scraping or automation projects you’ve done. Confirmation you can extract both voucher coupons and multibuy offers reliably.
- Proposals: 5 to 10
-
Payment unverified
-
$0 spent
-
Location Spain
- Fixed price
- Intermediate
- Est. budget: $30.00
I have 100k public URLs for LinkedIn companies. I need the industry data extracted from these URLs and put in a Google Sheet. Such as; non profit, healthcare, construction etc.
- Proposals: 5 to 10
-
Payment verified
-
$7K+ spent
-
Location United States
- Fixed price
- Intermediate
- Est. budget: $200.00
Please extract the Names and Emails of the 1020 people listed in the link below and put them in a CSV File. The job requires going to each dealer's page on 1stdibs and finding their email on their website. An example CSV of the 1st name on the list is captured and attached. URL- https://www.1stdibs.com/seller-directory/furniture
- Proposals: 50+
-
Payment verified
-
$2K+ spent
-
Location United Kingdom
- Hourly: $30.00 - $70.00
- Expert
- Est. time: Less than 1 month, Less than 30 hrs/week
Description: I’m looking for an expert to set up a scalable, automated system that collects job postings from a large list of company career sites (I already have a growing list from Apollo — eventually 10,000+ companies). The scraped jobs must then be pushed into Airtable in a clean, structured way. Here’s what I need: Scrape jobs from company career pages (list provided). Extract clean fields: Job Title Company Location Apply URL Date Posted Job Description Push the data into Airtable (I already have the base structure). Filter and tag sustainability-related roles using logic or keyword matching — this is a key requirement, not optional. The system must be scalable (handle thousands of companies) and low-maintenance (minimal ongoing fixes when sites change). Current situation: I trialed Apify and similar tools myself, but it quickly became too complex to scale. I already use Airtable, Clay, Smartlead, Apollo, etc. — so I have supporting systems in place. I now need an expert who can own the scraping setup end-to-end and deliver something reliable. Ideal freelancer: Strong experience with web scraping frameworks (Apify, Puppeteer, Playwright, Scrapy, etc.). Experience handling ATS platforms (Workday, Greenhouse, Lever, Taleo, SAP, etc.). Comfortable integrating with Airtable via API. Ability to design sustainability keyword filters that surface only the relevant jobs. Outcome: A working pipeline that automatically pulls new jobs from my company list, filters for sustainability relevance, deduplicates them, and keeps Airtable continuously up to date.
- Proposals: 20 to 50
-
Payment verified
-
$30K+ spent
-
Location Canada
- Hourly: $20.00 - $40.00
- Intermediate
- Est. time: Less than 1 month, Less than 30 hrs/week
We are looking for an automation-focused freelancer to build a script that downloads Canadian charity annual reports from our Google Sheet, renames them, and saves them in a strict folder structure. The script will: • Read from a Google Sheet • Download only the most recent annual report per eligible charity • Save it to a folder using the charity’s BN (Business Number) and report year • Match exactly the file structure of a sample ZIP folder we’ll provide We’re starting with a short paid hourly trial (10 charities), and if that works well, we’ll move to a fixed-price milestone for the full list. This job is well-suited to an intermediate-level Python (or Node.js) developer with strong automation skills. ⸻ 🧰 Skills Required • Python (preferred) or Node.js • Google Sheets API • File/folder automation • PDF downloads + renaming • Logging, error handling • Clear communication and high attention to detail ⸻ 🔧 Phase 1: Trial Sample (Hourly) We’ll begin with a paid trial phase where you: • Write a working script to process just 10 charities • Follow the formatting and folder structure exactly • Submit the 10 processed reports in a ZIP file • Log which charities succeeded, failed, or were skipped (and why) If successful, we’ll move to the next phase. ⸻ 💰 Phase 2: Full Milestone (Fixed Price) Once the trial is approved, we’ll proceed to a fixed-price milestone where you: • Run the script on 500–1000 charities • Deliver one ZIP file containing the full set • Provide the final working script + basic README 📁 Required Output Format Your script must create folders and filenames that match this exact format: annual_reports/811116813RR0001/2024_report.pdf Where: • 811116813RR0001 is the Business Number • 2024 is the year of the report. There can only be one year (2024 or 2025 or 2023) • report.pdf is case-sensitive and exact • The folder is named after the BN only • Only one file per charity — the most recent report We will provide a sample ZIP file so you can see exactly what we expect. ⚠️ Here’s Where Others Have Gone Wrong We’ve run this job before. Many submissions failed for simple — but critical — reasons. Please read this carefully. If you make these errors, we will not be able to use your work. 🚫 Wrong document type • Do NOT include financial statements, auditor reports, newsletters, etc. • Only true annual reports are accepted 🚫 Wrong file naming • You must use this exact format: annual_reports/BN/yyyy_report.pdf • ✅ Correct: annual_reports/811116813RR0001/2024_report.pdf • ❌ Wrong: 811116813RR0001_2024.pdf, 2022_2023_Annual.pdf, AnnualReport811.pdf, etc. 🚫 Incorrect or missing BNs • We only want BNs that end in RR0001 • Other BNs (e.g., RR0002) must be skipped 🚫 Typos or slight deviations • Any variation in naming (e.g., Report.pdf vs report.pdf, extra underscores, or wrong folder names) will break our system • No missing letters, extra characters, or abbreviation changes 🚫 Manual effort instead of scripting • This job is not just data entry — we need an automated script 🚫 No ZIP file • The output must be one ZIP file that replicates the folder structure exactly ✅ What We’ll Provide • Access to the Google Sheet: • A sample ZIP file showing exactly how your output should look • Fast review + clear feedback 📝 Deliverables (Recap) Trial Phase (Hourly): • A working script that processes 10 charities • A ZIP file with 10 correctly named folders + PDFs • A log file or CSV explaining which records were processed/skipped Full Phase (Fixed Price): • Script processes ~1000+ valid charities (will give you full list for quote) • One ZIP file with exact structure • Error handling for missing reports • Final working script + documentation ⸻ 📌 Final Notes • We need accuracy more than speed • The output must be bulletproof • You’re welcome to ask clarification questions at any point
- Proposals: 50+